Introduction

This code will visualize Statcounter’s dataset on media usage in South Korea and Japan in the last year.

Data Import

First, loading in the libraries for organization and data visualization.

# Load libraries
library(tidyverse)
library(RColorBrewer)

Next, I’ll be importing both datasets from statcounter.

# Import South Korea data
sm_sk <- read_csv("data/SM_SK.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   Date = col_character(),
##   Facebook = col_double(),
##   Twitter = col_double(),
##   YouTube = col_double(),
##   Pinterest = col_double(),
##   Instagram = col_double(),
##   Reddit = col_double(),
##   Tumblr = col_double(),
##   LinkedIn = col_double(),
##   VKontakte = col_double(),
##   Vimeo = col_double(),
##   StumbleUpon = col_double(),
##   Other = col_double()
## )
# Import Japan data
sm_jpn <- read_csv("data/SM_JPN.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   Date = col_character(),
##   Twitter = col_double(),
##   Pinterest = col_double(),
##   Facebook = col_double(),
##   YouTube = col_double(),
##   Instagram = col_double(),
##   Tumblr = col_double(),
##   Reddit = col_double(),
##   LinkedIn = col_double(),
##   VKontakte = col_double(),
##   news.ycombinator.com = col_double(),
##   `Sina Weibo` = col_double(),
##   mixi = col_double(),
##   Vimeo = col_double(),
##   Fark = col_double(),
##   StumbleUpon = col_double(),
##   Other = col_double()
## )

Data Preparation

Because this is time data, in order to be able to properly plot it, you have to explicitly tell R you are working with a date variable. In R, you can’t have a date variable without the day.

So I will arbitrarily say this is the first day of the month. It won’t come up on the graph though.

# Set data classes for South Korea
sm_sk$Date <- paste("01", sm_sk$Date, sep="-")
sm_sk$Date <- as.Date(sm_sk$Date, format='%d-%Y-%m')
class(sm_sk$Date)
## [1] "Date"
# Set data classes for Japan
sm_jpn$Date <- paste("01", sm_jpn$Date, sep="-")
sm_jpn$Date <- as.Date(sm_jpn$Date, format='%d-%Y-%m')
class(sm_jpn$Date)
## [1] "Date"

Tidying the Data

Now, for ease of plotting, we are going to make our data tidy. First, I define which social media platforms to show on the graph. On the website, they don’t show all the platforms in the dataset. They just show the (presumably) top 7 + “other”. It looks like they ignore all the rest of the columns. Just keep in mind there is some data we won’t be plotting.

# List the platforms that will be on the graph
sm_list = c("Twitter", "Pinterest", "Facebook", "Instagram", "Tumblr", "YouTube", "Reddit", "Other")

Now, we are going to tidy the data up. First, we are going to select the columns that correspond to the date and the aforementioned social media platforms that we will be plotting. Then, we will use a function (pivot_longer) to consolidate the data and make it tidy.

# Tidy South Korea data
sm_sk <- sm_sk %>%
  select(Date, Twitter, Pinterest, Facebook, YouTube, Instagram, Tumblr, Reddit, Other) %>%
  pivot_longer(sm_list,
               names_to = "SocialMedia",
               values_to = "PercentUsed")

# Tidy Japan data
sm_jpn <- sm_jpn %>%
  select(Date, Twitter, Pinterest, Facebook, YouTube, Instagram, Tumblr, Reddit, Other) %>%
  pivot_longer(sm_list,
               names_to = "SocialMedia",
               values_to = "PercentUsed") 

Visualize Data

Finally, we will plot the above data. I use the plotly::ggplotly wrapper to make the graph look really pretty. The group is explicitly defined as the social media platforms. The points and lines will have the same color. I decided to set the Y-axis values from 0-100 so that you can see the extent to which each platform is used.

## South Korea Graph
plotly::ggplotly(
  sm_sk %>%  
    ggplot(aes(Date, PercentUsed, group = SocialMedia, color=SocialMedia)) + 
    geom_point() +
    geom_line() +
    theme(axis.text.x = element_text(angle = 45),
          axis.title.x = element_blank(),
          text=element_text(family="TT Times New Roman")) +
    scale_x_date(date_labels="%b %Y",date_breaks  ="1 month") +
    scale_color_brewer(palette = "Dark2") +
    ylab("Used by % of Users") +
    labs(color = "Platform") +
    ylim(0, 100)
)
## Japan Graph
plotly::ggplotly(
  sm_jpn %>%  
    ggplot(aes(Date, PercentUsed, group = SocialMedia, color=SocialMedia)) + 
    geom_point() +
    geom_line() +
    theme(axis.text.x = element_text(angle = 45),
          axis.title.x = element_blank(),
          text=element_text(family="TT Times New Roman")) +
    scale_x_date(date_labels="%b %Y",date_breaks  ="1 month") +
    scale_color_brewer(palette = "Dark2") +
    ylab("Used by % of Users") +
    labs(color = "Platform") +
    ylim(0,100)
)

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-2 forcats_0.5.1      stringr_1.4.0      dplyr_1.0.5       
##  [5] purrr_0.3.4        readr_1.4.0        tidyr_1.1.3        tibble_3.1.1      
##  [9] ggplot2_3.3.3      tidyverse_1.3.1   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0  xfun_0.22         bslib_0.2.4       haven_2.4.1      
##  [5] colorspace_2.0-0  vctrs_0.3.7       generics_0.1.0    viridisLite_0.4.0
##  [9] htmltools_0.5.1.1 yaml_2.2.1        plotly_4.9.3      utf8_1.2.1       
## [13] rlang_0.4.10      jquerylib_0.1.4   pillar_1.6.0      glue_1.4.2       
## [17] withr_2.4.2       DBI_1.1.1         dbplyr_2.1.1      modelr_0.1.8     
## [21] readxl_1.3.1      lifecycle_1.0.0   munsell_0.5.0     gtable_0.3.0     
## [25] cellranger_1.1.0  rvest_1.0.0       htmlwidgets_1.5.3 evaluate_0.14    
## [29] labeling_0.4.2    knitr_1.33        crosstalk_1.1.1   fansi_0.4.2      
## [33] broom_0.7.6       Rcpp_1.0.6        backports_1.2.1   scales_1.1.1     
## [37] jsonlite_1.7.2    fs_1.5.0          hms_1.0.0         digest_0.6.27    
## [41] stringi_1.5.3     grid_4.0.2        cli_2.5.0         tools_4.0.2      
## [45] magrittr_2.0.1    sass_0.3.1        lazyeval_0.2.2    crayon_1.4.1     
## [49] pkgconfig_2.0.3   ellipsis_0.3.1    data.table_1.14.0 xml2_1.3.2       
## [53] reprex_2.0.0      lubridate_1.7.10  assertthat_0.2.1  rmarkdown_2.7    
## [57] httr_1.4.2        rstudioapi_0.13   R6_2.5.0          compiler_4.0.2